Overview

Dataset statistics

Number of variables16
Number of observations131250
Missing cells22740
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory21.1 MiB
Average record size in memory168.2 B

Variable types

Categorical5
DateTime2
Numeric8
Boolean1

Alerts

trip_distance is highly overall correlated with store_and_fwd_flag and 2 other fieldsHigh correlation
RatecodeID is highly overall correlated with tolls_amountHigh correlation
extra is highly overall correlated with Airport_feeHigh correlation
tolls_amount is highly overall correlated with RatecodeIDHigh correlation
VendorID is highly overall correlated with improvement_surchargeHigh correlation
store_and_fwd_flag is highly overall correlated with trip_distanceHigh correlation
improvement_surcharge is highly overall correlated with VendorID and 1 other fieldsHigh correlation
congestion_surcharge is highly overall correlated with trip_distance and 1 other fieldsHigh correlation
Airport_fee is highly overall correlated with trip_distance and 1 other fieldsHigh correlation
store_and_fwd_flag is highly imbalanced (94.0%)Imbalance
payment_type is highly imbalanced (55.8%)Imbalance
improvement_surcharge is highly imbalanced (95.5%)Imbalance
congestion_surcharge is highly imbalanced (69.2%)Imbalance
Airport_fee is highly imbalanced (70.8%)Imbalance
passenger_count has 4548 (3.5%) missing valuesMissing
RatecodeID has 4548 (3.5%) missing valuesMissing
store_and_fwd_flag has 4548 (3.5%) missing valuesMissing
congestion_surcharge has 4548 (3.5%) missing valuesMissing
Airport_fee has 4548 (3.5%) missing valuesMissing
trip_distance is highly skewed (γ1 = 262.2750407)Skewed
tip_amount has unique valuesUnique
passenger_count has 2160 (1.6%) zerosZeros
trip_distance has 1990 (1.5%) zerosZeros
extra has 38442 (29.3%) zerosZeros
tolls_amount has 119444 (91.0%) zerosZeros

Reproduction

Analysis started2023-11-11 17:05:48.242166
Analysis finished2023-11-11 17:06:12.076019
Duration23.83 seconds
Software versionydata-profiling vv4.6.0
Download configurationconfig.json

Variables

VendorID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.0 MiB
1
95498 
0
35703 
2
 
49

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters131250
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 95498
72.8%
0 35703
 
27.2%
2 49
 
< 0.1%

Length

2023-11-11T22:36:12.323034image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-11T22:36:12.613354image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1 95498
72.8%
0 35703
 
27.2%
2 49
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
1 95498
72.8%
0 35703
 
27.2%
2 49
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 131250
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 95498
72.8%
0 35703
 
27.2%
2 49
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 131250
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 95498
72.8%
0 35703
 
27.2%
2 49
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 131250
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 95498
72.8%
0 35703
 
27.2%
2 49
 
< 0.1%
Distinct91592
Distinct (%)69.8%
Missing0
Missing (%)0.0%
Memory size6.0 MiB
Minimum2023-06-28 15:28:01
Maximum2023-07-01 00:58:11
2023-11-11T22:36:12.779734image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:12.943197image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct91377
Distinct (%)69.6%
Missing0
Missing (%)0.0%
Memory size6.0 MiB
Minimum2023-06-28 15:32:43
Maximum2023-07-01 23:10:43
2023-11-11T22:36:13.106242image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:13.272875image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

passenger_count
Real number (ℝ)

MISSING  ZEROS 

Distinct8
Distinct (%)< 0.1%
Missing4548
Missing (%)3.5%
Infinite0
Infinite (%)0.0%
Mean1.3580764
Minimum0
Maximum8
Zeros2160
Zeros (%)1.6%
Negative0
Negative (%)0.0%
Memory size6.0 MiB
2023-11-11T22:36:13.419215image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8924777
Coefficient of variation (CV)0.65716309
Kurtosis9.4602262
Mean1.3580764
Median Absolute Deviation (MAD)0
Skewness2.8667044
Sum172071
Variance0.79651645
MonotonicityNot monotonic
2023-11-11T22:36:13.615527image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
1 96306
73.4%
2 18281
 
13.9%
3 4513
 
3.4%
4 2751
 
2.1%
0 2160
 
1.6%
5 1490
 
1.1%
6 1199
 
0.9%
8 2
 
< 0.1%
(Missing) 4548
 
3.5%
ValueCountFrequency (%)
0 2160
 
1.6%
1 96306
73.4%
2 18281
 
13.9%
3 4513
 
3.4%
4 2751
 
2.1%
5 1490
 
1.1%
6 1199
 
0.9%
8 2
 
< 0.1%
ValueCountFrequency (%)
8 2
 
< 0.1%
6 1199
 
0.9%
5 1490
 
1.1%
4 2751
 
2.1%
3 4513
 
3.4%
2 18281
 
13.9%
1 96306
73.4%
0 2160
 
1.6%

trip_distance
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct2802
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.6461714
Minimum0
Maximum135182.06
Zeros1990
Zeros (%)1.5%
Negative0
Negative (%)0.0%
Memory size6.0 MiB
2023-11-11T22:36:13.883175image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.48
Q11.08
median1.84
Q33.63
95-th percentile16.67
Maximum135182.06
Range135182.06
Interquartile range (IQR)2.55

Descriptive statistics

Standard deviation456.06402
Coefficient of variation (CV)80.774031
Kurtosis71610.377
Mean5.6461714
Median Absolute Deviation (MAD)0.96
Skewness262.27504
Sum741059.99
Variance207994.39
MonotonicityNot monotonic
2023-11-11T22:36:14.078799image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1990
 
1.5%
1 1824
 
1.4%
1.2 1756
 
1.3%
0.8 1738
 
1.3%
1.1 1727
 
1.3%
0.9 1725
 
1.3%
0.7 1642
 
1.3%
1.4 1579
 
1.2%
1.3 1565
 
1.2%
1.5 1518
 
1.2%
Other values (2792) 114186
87.0%
ValueCountFrequency (%)
0 1990
1.5%
0.01 128
 
0.1%
0.02 81
 
0.1%
0.03 73
 
0.1%
0.04 34
 
< 0.1%
0.05 47
 
< 0.1%
0.06 35
 
< 0.1%
0.07 34
 
< 0.1%
0.08 27
 
< 0.1%
0.09 18
 
< 0.1%
ValueCountFrequency (%)
135182.06 1
< 0.1%
92292.43 1
< 0.1%
20314 1
< 0.1%
9673.69 1
< 0.1%
143.35 1
< 0.1%
104.09 1
< 0.1%
84.16 1
< 0.1%
83.69 1
< 0.1%
79.55 1
< 0.1%
73.23 1
< 0.1%

RatecodeID
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct6
Distinct (%)< 0.1%
Missing4548
Missing (%)3.5%
Infinite0
Infinite (%)0.0%
Mean1.5172847
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 MiB
2023-11-11T22:36:14.232691image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum99
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation6.497366
Coefficient of variation (CV)4.2822327
Kurtosis220.17059
Mean1.5172847
Median Absolute Deviation (MAD)0
Skewness14.874864
Sum192243
Variance42.215765
MonotonicityNot monotonic
2023-11-11T22:36:14.346178image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 118931
90.6%
2 5517
 
4.2%
5 805
 
0.6%
99 558
 
0.4%
3 553
 
0.4%
4 338
 
0.3%
(Missing) 4548
 
3.5%
ValueCountFrequency (%)
1 118931
90.6%
2 5517
 
4.2%
3 553
 
0.4%
4 338
 
0.3%
5 805
 
0.6%
99 558
 
0.4%
ValueCountFrequency (%)
99 558
 
0.4%
5 805
 
0.6%
4 338
 
0.3%
3 553
 
0.4%
2 5517
 
4.2%
1 118931
90.6%

store_and_fwd_flag
Boolean

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing4548
Missing (%)3.5%
Memory size5.3 MiB
False
125819 
True
 
883
(Missing)
 
4548
ValueCountFrequency (%)
False 125819
95.9%
True 883
 
0.7%
(Missing) 4548
 
3.5%
2023-11-11T22:36:14.482608image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

PULocationID
Real number (ℝ)

Distinct264
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean132.86859
Minimum1
Maximum264
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 MiB
2023-11-11T22:36:14.612964image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile14
Q167
median133
Q3199
95-th percentile251
Maximum264
Range263
Interquartile range (IQR)132

Descriptive statistics

Standard deviation76.201612
Coefficient of variation (CV)0.57351109
Kurtosis-1.1970131
Mean132.86859
Median Absolute Deviation (MAD)66
Skewness-0.0077132294
Sum17439003
Variance5806.6857
MonotonicityNot monotonic
2023-11-11T22:36:15.052300image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8 558
 
0.4%
117 549
 
0.4%
247 548
 
0.4%
146 548
 
0.4%
93 545
 
0.4%
171 545
 
0.4%
43 544
 
0.4%
101 542
 
0.4%
162 540
 
0.4%
264 540
 
0.4%
Other values (254) 125791
95.8%
ValueCountFrequency (%)
1 500
0.4%
2 483
0.4%
3 513
0.4%
4 491
0.4%
5 509
0.4%
6 491
0.4%
7 493
0.4%
8 558
0.4%
9 502
0.4%
10 473
0.4%
ValueCountFrequency (%)
264 540
0.4%
263 483
0.4%
262 488
0.4%
261 494
0.4%
260 514
0.4%
259 499
0.4%
258 506
0.4%
257 502
0.4%
256 531
0.4%
255 493
0.4%

DOLocationID
Real number (ℝ)

Distinct264
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean132.8411
Minimum1
Maximum264
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 MiB
2023-11-11T22:36:15.308541image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile14
Q167
median133
Q3199
95-th percentile252
Maximum264
Range263
Interquartile range (IQR)132

Descriptive statistics

Standard deviation76.16586
Coefficient of variation (CV)0.57336066
Kurtosis-1.1969142
Mean132.8411
Median Absolute Deviation (MAD)66
Skewness-0.0037655891
Sum17435394
Variance5801.2382
MonotonicityNot monotonic
2023-11-11T22:36:15.487092image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
263 575
 
0.4%
127 571
 
0.4%
167 548
 
0.4%
98 546
 
0.4%
247 543
 
0.4%
14 543
 
0.4%
226 542
 
0.4%
115 541
 
0.4%
153 540
 
0.4%
231 539
 
0.4%
Other values (254) 125762
95.8%
ValueCountFrequency (%)
1 476
0.4%
2 502
0.4%
3 456
0.3%
4 515
0.4%
5 496
0.4%
6 477
0.4%
7 510
0.4%
8 476
0.4%
9 463
0.4%
10 478
0.4%
ValueCountFrequency (%)
264 522
0.4%
263 575
0.4%
262 487
0.4%
261 514
0.4%
260 483
0.4%
259 509
0.4%
258 460
0.4%
257 503
0.4%
256 512
0.4%
255 481
0.4%

payment_type
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.0 MiB
Credit Card
101434 
Cash
22586 
Wallet
 
4548
unknown
 
1766
UPI
 
916

Length

Max length11
Median length11
Mean length9.5125029
Min length3

Characters and Unicode

Total characters1248516
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCredit Card
2nd rowCredit Card
3rd rowCredit Card
4th rowCredit Card
5th rowCredit Card

Common Values

ValueCountFrequency (%)
Credit Card 101434
77.3%
Cash 22586
 
17.2%
Wallet 4548
 
3.5%
unknown 1766
 
1.3%
UPI 916
 
0.7%

Length

2023-11-11T22:36:15.656315image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-11T22:36:15.811509image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
credit 101434
43.6%
card 101434
43.6%
cash 22586
 
9.7%
wallet 4548
 
2.0%
unknown 1766
 
0.8%
upi 916
 
0.4%

Most occurring characters

ValueCountFrequency (%)
C 225454
18.1%
d 202868
16.2%
r 202868
16.2%
a 128568
10.3%
e 105982
8.5%
t 105982
8.5%
i 101434
8.1%
101434
8.1%
s 22586
 
1.8%
h 22586
 
1.8%
Other values (10) 28754
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 914332
73.2%
Uppercase Letter 232750
 
18.6%
Space Separator 101434
 
8.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
d 202868
22.2%
r 202868
22.2%
a 128568
14.1%
e 105982
11.6%
t 105982
11.6%
i 101434
11.1%
s 22586
 
2.5%
h 22586
 
2.5%
l 9096
 
1.0%
n 5298
 
0.6%
Other values (4) 7064
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
C 225454
96.9%
W 4548
 
2.0%
U 916
 
0.4%
P 916
 
0.4%
I 916
 
0.4%
Space Separator
ValueCountFrequency (%)
101434
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1147082
91.9%
Common 101434
 
8.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 225454
19.7%
d 202868
17.7%
r 202868
17.7%
a 128568
11.2%
e 105982
9.2%
t 105982
9.2%
i 101434
8.8%
s 22586
 
2.0%
h 22586
 
2.0%
l 9096
 
0.8%
Other values (9) 19658
 
1.7%
Common
ValueCountFrequency (%)
101434
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1248516
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 225454
18.1%
d 202868
16.2%
r 202868
16.2%
a 128568
10.3%
e 105982
8.5%
t 105982
8.5%
i 101434
8.1%
101434
8.1%
s 22586
 
1.8%
h 22586
 
1.8%
Other values (10) 28754
 
2.3%

extra
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct28
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9341551
Minimum-7.5
Maximum11.75
Zeros38442
Zeros (%)29.3%
Negative843
Negative (%)0.6%
Memory size6.0 MiB
2023-11-11T22:36:15.951862image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-7.5
5-th percentile0
Q10
median1
Q32.5
95-th percentile5
Maximum11.75
Range19.25
Interquartile range (IQR)2.5

Descriptive statistics

Standard deviation1.9519249
Coefficient of variation (CV)1.0091873
Kurtosis1.9513187
Mean1.9341551
Median Absolute Deviation (MAD)1.5
Skewness1.0989403
Sum253857.86
Variance3.8100107
MonotonicityNot monotonic
2023-11-11T22:36:16.091601image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
0 38442
29.3%
2.5 37622
28.7%
1 26666
20.3%
5 13002
 
9.9%
3.5 8799
 
6.7%
7.5 1771
 
1.3%
6 1332
 
1.0%
4.25 679
 
0.5%
9.25 617
 
0.5%
-1 395
 
0.3%
Other values (18) 1925
 
1.5%
ValueCountFrequency (%)
-7.5 14
 
< 0.1%
-6 15
 
< 0.1%
-5 57
 
< 0.1%
-2.5 362
 
0.3%
-1 395
 
0.3%
0 38442
29.3%
0.11 1
 
< 0.1%
0.25 1
 
< 0.1%
0.75 2
 
< 0.1%
1 26666
20.3%
ValueCountFrequency (%)
11.75 196
 
0.1%
10.25 175
 
0.1%
10 76
 
0.1%
9.25 617
 
0.5%
8.5 42
 
< 0.1%
7.75 155
 
0.1%
7.5 1771
1.3%
6.75 210
 
0.2%
6 1332
1.0%
5.25 7
 
< 0.1%

tip_amount
Real number (ℝ)

UNIQUE 

Distinct131250
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.1329069
Minimum0.00012939597
Maximum484.87615
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 MiB
2023-11-11T22:36:16.263049image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.00012939597
5-th percentile1.025461
Q13.4761096
median5.2979448
Q37.5146978
95-th percentile15.118223
Maximum484.87615
Range484.87602
Interquartile range (IQR)4.0385882

Descriptive statistics

Standard deviation4.6573437
Coefficient of variation (CV)0.75940231
Kurtosis893.83261
Mean6.1329069
Median Absolute Deviation (MAD)1.9947844
Skewness11.313853
Sum804944.04
Variance21.69085
MonotonicityNot monotonic
2023-11-11T22:36:16.440800image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.776779595 1
 
< 0.1%
6.008778287 1
 
< 0.1%
3.149878845 1
 
< 0.1%
18.87838917 1
 
< 0.1%
1.105655264 1
 
< 0.1%
4.557259746 1
 
< 0.1%
5.127701094 1
 
< 0.1%
1.290491412 1
 
< 0.1%
6.943745452 1
 
< 0.1%
6.652131774 1
 
< 0.1%
Other values (131240) 131240
> 99.9%
ValueCountFrequency (%)
0.0001293959739 1
< 0.1%
0.0002753208 1
< 0.1%
0.0004350792141 1
< 0.1%
0.0004601416502 1
< 0.1%
0.0007132443445 1
< 0.1%
0.0008942244453 1
< 0.1%
0.0009188518326 1
< 0.1%
0.001051727527 1
< 0.1%
0.001095234922 1
< 0.1%
0.00109933439 1
< 0.1%
ValueCountFrequency (%)
484.8761506 1
< 0.1%
184.3134585 1
< 0.1%
170.761373 1
< 0.1%
110.9277285 1
< 0.1%
101.3168704 1
< 0.1%
91.16362159 1
< 0.1%
84.47570597 1
< 0.1%
84.03261678 1
< 0.1%
80.82858772 1
< 0.1%
76.40339182 1
< 0.1%

tolls_amount
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct184
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6505181
Minimum-29.3
Maximum80
Zeros119444
Zeros (%)91.0%
Negative92
Negative (%)0.1%
Memory size6.0 MiB
2023-11-11T22:36:16.622616image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-29.3
5-th percentile0
Q10
median0
Q30
95-th percentile6.55
Maximum80
Range109.3
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.3364782
Coefficient of variation (CV)3.591719
Kurtosis54.933921
Mean0.6505181
Median Absolute Deviation (MAD)0
Skewness5.2022792
Sum85380.5
Variance5.4591303
MonotonicityNot monotonic
2023-11-11T22:36:16.793451image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 119444
91.0%
6.55 10548
 
8.0%
12.75 185
 
0.1%
14.75 169
 
0.1%
3 99
 
0.1%
-6.55 78
 
0.1%
19.3 66
 
0.1%
13.1 64
 
< 0.1%
21.3 41
 
< 0.1%
2.45 36
 
< 0.1%
Other values (174) 520
 
0.4%
ValueCountFrequency (%)
-29.3 1
< 0.1%
-21.3 1
< 0.1%
-14.75 1
< 0.1%
-12.75 1
< 0.1%
-12.55 1
< 0.1%
-10.5 1
< 0.1%
-10 2
< 0.1%
-8.55 1
< 0.1%
-8.5 1
< 0.1%
-8.3 2
< 0.1%
ValueCountFrequency (%)
80 1
< 0.1%
76 1
< 0.1%
63 1
< 0.1%
53 1
< 0.1%
47.25 1
< 0.1%
45.15 1
< 0.1%
42.55 1
< 0.1%
40 1
< 0.1%
38.25 1
< 0.1%
36.05 1
< 0.1%

improvement_surcharge
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.0 MiB
1.0
129838 
-1.0
 
1314
0.3
 
64
0.0
 
34

Length

Max length4
Median length3
Mean length3.0100114
Min length3

Characters and Unicode

Total characters395064
Distinct characters5
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 129838
98.9%
-1.0 1314
 
1.0%
0.3 64
 
< 0.1%
0.0 34
 
< 0.1%

Length

2023-11-11T22:36:16.952970image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-11T22:36:17.093470image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0 131152
99.9%
0.3 64
 
< 0.1%
0.0 34
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 131284
33.2%
. 131250
33.2%
1 131152
33.2%
- 1314
 
0.3%
3 64
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 262500
66.4%
Other Punctuation 131250
33.2%
Dash Punctuation 1314
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 131284
50.0%
1 131152
50.0%
3 64
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 131250
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1314
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 395064
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 131284
33.2%
. 131250
33.2%
1 131152
33.2%
- 1314
 
0.3%
3 64
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 395064
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 131284
33.2%
. 131250
33.2%
1 131152
33.2%
- 1314
 
0.3%
3 64
 
< 0.1%

congestion_surcharge
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing4548
Missing (%)3.5%
Memory size6.0 MiB
2.5
114843 
0.0
 
10806
-2.5
 
1053

Length

Max length4
Median length3
Mean length3.0083108
Min length3

Characters and Unicode

Total characters381159
Distinct characters5
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.5
2nd row2.5
3rd row0.0
4th row2.5
5th row2.5

Common Values

ValueCountFrequency (%)
2.5 114843
87.5%
0.0 10806
 
8.2%
-2.5 1053
 
0.8%
(Missing) 4548
 
3.5%

Length

2023-11-11T22:36:17.220410image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-11T22:36:17.414349image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2.5 115896
91.5%
0.0 10806
 
8.5%

Most occurring characters

ValueCountFrequency (%)
. 126702
33.2%
2 115896
30.4%
5 115896
30.4%
0 21612
 
5.7%
- 1053
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 253404
66.5%
Other Punctuation 126702
33.2%
Dash Punctuation 1053
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 115896
45.7%
5 115896
45.7%
0 21612
 
8.5%
Other Punctuation
ValueCountFrequency (%)
. 126702
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1053
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 381159
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 126702
33.2%
2 115896
30.4%
5 115896
30.4%
0 21612
 
5.7%
- 1053
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 381159
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 126702
33.2%
2 115896
30.4%
5 115896
30.4%
0 21612
 
5.7%
- 1053
 
0.3%

Airport_fee
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing4548
Missing (%)3.5%
Memory size6.0 MiB
0.0
114702 
1.75
11808 
-1.75
 
192

Length

Max length5
Median length3
Mean length3.0962258
Min length3

Characters and Unicode

Total characters392298
Distinct characters6
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 114702
87.4%
1.75 11808
 
9.0%
-1.75 192
 
0.1%
(Missing) 4548
 
3.5%

Length

2023-11-11T22:36:17.589169image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-11T22:36:17.787873image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0 114702
90.5%
1.75 12000
 
9.5%

Most occurring characters

ValueCountFrequency (%)
0 229404
58.5%
. 126702
32.3%
1 12000
 
3.1%
7 12000
 
3.1%
5 12000
 
3.1%
- 192
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 265404
67.7%
Other Punctuation 126702
32.3%
Dash Punctuation 192
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 229404
86.4%
1 12000
 
4.5%
7 12000
 
4.5%
5 12000
 
4.5%
Other Punctuation
ValueCountFrequency (%)
. 126702
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 192
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 392298
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 229404
58.5%
. 126702
32.3%
1 12000
 
3.1%
7 12000
 
3.1%
5 12000
 
3.1%
- 192
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 392298
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 229404
58.5%
. 126702
32.3%
1 12000
 
3.1%
7 12000
 
3.1%
5 12000
 
3.1%
- 192
 
< 0.1%

Interactions

2023-11-11T22:36:08.268396image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:55.590409image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:57.485018image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:59.631044image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:01.473061image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:03.167286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:04.717801image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:06.646739image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:08.465801image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:55.834022image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:57.712255image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:59.886650image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:01.690145image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:03.392640image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:04.937120image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:06.841032image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:08.661823image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:56.091455image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:57.951158image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:00.134549image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:01.927860image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:03.589236image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:05.142823image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:07.031767image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:08.845800image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:56.332740image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:58.205892image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:00.374982image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:02.159228image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:03.774425image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:05.342121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:07.212086image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:09.029507image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:56.533549image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:58.422481image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:00.577435image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:02.374253image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:03.954443image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:05.532639image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:07.418434image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:09.278415image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:56.735798image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:58.666680image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:00.813672image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:02.555419image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:04.136350image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:05.729958image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:07.649849image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:09.477861image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:56.963286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:58.937812image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:01.061102image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:02.749560image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:04.331418image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:05.941417image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:07.888320image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:09.659532image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:57.267199image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:35:59.242314image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:01.266351image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:02.938245image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:04.524253image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:06.455792image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-11-11T22:36:08.086660image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2023-11-11T22:36:17.948515image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
passenger_counttrip_distanceRatecodeIDPULocationIDDOLocationIDextratip_amounttolls_amountVendorIDstore_and_fwd_flagpayment_typeimprovement_surchargecongestion_surchargeAirport_fee
passenger_count1.0000.0640.0790.0010.000-0.0290.0090.0650.2280.0580.0390.0140.0170.050
trip_distance0.0641.0000.292-0.002-0.0050.0830.3820.4420.0001.0000.0130.0001.0001.000
RatecodeID0.0790.2921.0000.005-0.002-0.0840.1380.5250.1080.0020.0330.0230.2180.021
PULocationID0.001-0.0020.0051.0000.001-0.000-0.0010.0020.0000.0030.0000.0000.0000.003
DOLocationID0.000-0.005-0.0020.0011.0000.002-0.000-0.0020.0000.0000.0000.0040.0020.000
extra-0.0290.083-0.084-0.0000.0021.0000.1010.1390.4120.0750.2050.3420.3960.516
tip_amount0.0090.3820.138-0.001-0.0000.1011.0000.2560.0000.0000.0000.0000.0330.017
tolls_amount0.0650.4420.5250.002-0.0020.1390.2561.0000.0230.0150.0420.0570.1560.382
VendorID0.2280.0000.1080.0000.0000.4120.0000.0231.0000.1250.0880.6200.0570.045
store_and_fwd_flag0.0581.0000.0020.0030.0000.0750.0000.0150.1251.0000.0250.0310.0090.002
payment_type0.0390.0130.0330.0000.0000.2050.0000.0420.0880.0251.0000.3220.3640.145
improvement_surcharge0.0140.0000.0230.0000.0040.3420.0000.0570.6200.0310.3221.0000.6350.270
congestion_surcharge0.0171.0000.2180.0000.0020.3960.0330.1560.0570.0090.3640.6351.0000.329
Airport_fee0.0501.0000.0210.0030.0000.5160.0170.3820.0450.0020.1450.2700.3291.000

Missing values

2023-11-11T22:36:10.015529image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-11T22:36:10.595741image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-11-11T22:36:11.494022image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typeextratip_amounttolls_amountimprovement_surchargecongestion_surchargeAirport_fee
9436602023-06-29 18:18:352023-06-29 19:27:451.01.301.0N33137Credit Card5.04.7767800.001.02.50.0
1177702023-06-30 12:18:422023-06-30 12:33:141.010.801.0N73114Credit Card7.58.4763916.551.02.50.0
3333012023-06-29 10:30:272023-06-29 10:42:520.00.005.0N23845Credit Card0.01.1926496.551.00.00.0
15825312023-06-28 18:10:272023-06-28 17:23:271.03.651.0N389Credit Card2.58.2729610.001.02.50.0
11402012023-06-29 07:53:012023-06-29 08:46:261.03.431.0N141233Credit Card0.07.1025200.001.02.50.0
16933512023-06-29 13:13:512023-06-29 14:25:392.01.581.0N5170Credit Card0.03.3391510.001.02.50.0
972012023-06-29 23:02:272023-06-29 22:34:292.01.221.0N171167Credit Card1.04.0132050.001.02.50.0
862602023-06-30 13:31:112023-06-30 13:32:411.01.901.0N14446Credit Card2.56.6818730.001.02.50.0
3571212023-06-30 00:31:392023-06-30 01:48:491.04.591.0N35146Credit Card1.09.8391120.001.02.50.0
9657212023-06-30 17:24:462023-06-30 16:42:251.01.721.0N84178Cash2.52.7408250.001.02.50.0
VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typeextratip_amounttolls_amountimprovement_surchargecongestion_surchargeAirport_fee
990902023-06-30 06:37:072023-06-30 06:18:390.01.601.0N3234Credit Card2.501.7622310.001.02.50.00
2145302023-06-29 07:29:232023-06-29 06:36:361.01.401.0N121225Cash2.502.8068210.001.02.50.00
15081012023-06-28 17:37:382023-06-28 19:07:471.09.881.0N102149Credit Card7.505.9919836.551.02.51.75
10142412023-06-30 18:17:022023-06-30 18:13:411.01.641.0N21973Credit Card2.504.2210790.001.02.50.00
15225212023-06-29 20:20:162023-06-29 21:41:021.00.971.0N1268Credit Card1.003.5137160.001.02.50.00
885202023-06-29 13:57:292023-06-29 13:27:251.00.301.0N176167Cash2.503.7803240.001.02.50.00
6975212023-06-29 09:05:512023-06-29 09:30:172.06.811.0N6811Credit Card0.000.6952600.001.02.50.00
15282712023-06-30 15:40:442023-06-30 15:06:361.01.371.0N186198Credit Card0.003.9635550.001.02.50.00
8032102023-06-28 18:22:082023-06-28 19:48:180.09.301.0N164118Credit Card11.7513.4661046.551.02.51.75
9021512023-06-29 17:24:182023-06-29 19:16:29NaN1.76NaNNaN218202Wallet0.004.9017280.001.0NaNNaN